Abstract
Background In systemic light chain (AL) amyloidosis, hematologic remission does not guarantee organ recovery. Discordance between hematologic and end organ responses is clinically relevant but remains poorly characterized. We sought to identify predictors of discordant response among patients achieving VGPR or better and compare the predictive performance of machine learning (ML) to logistic regression (LR) models for organ response.
Methods We retrospectively analyzed 376 newly diagnosed AL amyloidosis patients treated at a tertiary center from 2017–2023. Cardiac and renal responses were defined per ISA criteria and assessed at 6, 12, 18, and 24 months. Patients were classified as early responders or non-responders based on organ recovery at 6 months. Discordance was defined as lack of cardiac or renal response at each time point despite achieving VGPR or better. Clinical variables associated with discordance at 6 and 12 months were evaluated using univariate logistic regression. ML models were developed using a distributed gradient boosting decision tree ML model (XGBoost) and compared to LR across 37 baseline variables. SHAP (SHapley Additive exPlanations) values were used to interpret feature importance.
Results In describing clinical trajectories, patients were classified as early responders or non-responders based on organ recovery within 6 months. In those with renal involvement (n = 238), early renal response occurred in 67%. Of these, 86%, 75%, and 71% maintained response at 12, 18, and 24 months. Renal non-responders had subsequent response rates of 8.4%, 19%, and 19%. In patients with cardiac involvement (n = 273), 59% achieved early response, improving to 85%, 68%, and 70% at 12, 18, and 24 months. For cardiac non-responders, response was observed in 21%, 26%, and 29%, respectively. Among patients achieving VGPR or better, 35.2% lacked cardiac response and 31.1% lacked renal response at 6 months. At 12 months, 24.5% lacked cardiac response and 29.2% lacked renal response, exemplifying discordance between deep hematologic remission and end organ recovery.
At 6 months, cardiac discordance was significantly associated with neuropathy (OR 2.98, 95% CI 1.28–7.11; p=0.012) and heart failure exacerbation (OR 1.98, CI 1.06–3.71; p=0.032). At 12 months, male sex (OR 2.49, CI 1.21–5.48; p=0.017) and concurrent renal involvement (OR 3.19, CI 1.55–6.99; p=0.002) were significant predictors. Renal discordance at 6 months was significantly associated with male sex (OR 2.24, CI 1.08–4.90; p=0.035), Black race (OR 2.67, CI 1.18–6.03; p=0.017), and hospitalization during treatment (OR 3.45, CI 1.73–7.08; p<0.001). At 12 months, Black race (OR 3.39, CI 1.54–7.53; p=0.002), hospitalization (OR 2.02, CI 1.03–3.99; p=0.041), and lambda light chain subtype (OR 0.34, CI 0.16–0.69; p=0.003) retained significance.
ML consistently outperformed LR in predicting cardiac and renal responses. For cardiac response, XGBoost achieved higher AUC and accuracy at all timepoints, with the largest improvements at 6 months (AUC 0.83 vs. 0.61; accuracy 85.4% vs. 63.0%) and 18 months (AUC 0.84 vs. 0.67; accuracy 86.8% vs. 69.8%). For renal response at 6 months, XGBoost achieved an AUC of 0.77 and an accuracy of 81.9%, compared to LR (AUC 0.68; accuracy 70.8%). This trend persisted at 12, 18, and 24 months, underscoring the improved model discrimination and predictive capability. As an example, for 6-month cardiac response, ML was 94% sensitive and 73% specific vs. 67% sensitive and 56% specific for LR. SHAP summary plots identified 24-hour urine protein (SHAP = 0.43), serum M-protein (0.33), age (0.32), dFLC (0.20), and NT-proBNP (0.16) as the most impactful predictors of response. A predictive scoring system based on maximization of the Youden Index and proportional weighting of each variable is under development. Validation of this predictive score, detailed model input, and response predictions will be presented.
Conclusions A significant subset of patients with AL amyloidosis experience discordant outcomes, failing to achieve organ response despite deep hematologic remission. Clinical risk factors for discordance include male sex, Black race, and hospitalization during treatment. ML provides superior predictive performance and interpretability compared to logistic regression and may facilitate earlier recognition of patients at risk for discordance, enabling the development of prognostic models and more personalized treatment approaches.